Segment Choice Models: Feature-Rich Models for Global Distortion in Statistical Machine Translation

نویسندگان

  • Roland Kuhn
  • Denis Yuen
  • Michel Simard
  • Patrick Paul
  • George F. Foster
  • Eric Joanis
  • Howard Johnson
چکیده

This paper presents a new approach to distortion (phrase reordering) in phrasebased machine translation (MT). Distortion is modeled as a sequence of choices during translation. The approach yields trainable, probabilistic distortion models that are global: they assign a probability to each possible phrase reordering. These “segment choice” models (SCMs) can be trained on “segment-aligned” sentence pairs; they can be applied during decoding or rescoring. The approach yields a metric called “distortion perplexity” (“disperp”) for comparing SCMs offline on test data, analogous to perplexity for language models. A decision-tree-based SCM is tested on Chinese-to-English translation, and outperforms a baseline distortion penalty approach at the 99% confidence level.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PORTAGE: with Smoothed Phrase Tables and Segment Choice Models

Improvements to Portage and its participation in the shared task of NAACL 2006 Workshop on Statistical Machine Translation are described. Promising ideas in phrase table smoothing and global distortion using feature-rich models are discussed as well as numerous improvements in the software base.

متن کامل

Inductive Detection of Language Features via Clustering Minimal Pairs: Toward Feature-Rich Grammars in Machine Translation

Syntax-based Machine Translation systems have recently become a focus of research with much hope that they will outperform traditional Phrase-Based Statistical Machine Translation (PBSMT). Toward this goal, we present a method for analyzing the morphosyntactic content of language from an Elicitation Corpus such as the one available in the LDC’s LCTL language packs. The presented method discover...

متن کامل

Feature-Rich Phrase-based Translation: Stanford University's Submission to the WMT 2013 Translation Task

We describe the Stanford University NLP Group submission to the 2013 Workshop on Statistical Machine Translation Shared Task. We demonstrate the effectiveness of a new adaptive, online tuning algorithm that scales to large feature and tuning sets. For both English-French and English-German, the algorithm produces feature-rich models that improve over a dense baseline and compare favorably to mo...

متن کامل

Discriminative Feature-Rich Modeling for Syntax-Based Machine Translation

State-of-the-art statistical machine translation systems are most frequently built on phrasebased (Koehn et al., 2003) or hierarchical translation models (Chiang, 2005). In addition, a wide variety of models exploiting syntactic annotation on either the source or target side (or both) have recently been developed and also give state-of-the-art performance (Galley et al., 2006; Zollmann and Venu...

متن کامل

Dynamic distortion in a discriminative reordering model for statistical machine translation

Most phrase-based statistical machine translation systems use a so-called distortion limit to keep the size of the search space manageable. In addition, a distance-based distortion penalty is used as a feature to keep the decoder to translate monotonically unless there is sufficient support for a jump from other features, particularly the language models. To overcome the issue of setting the op...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006